Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

نویسندگان

  • Alberto Bertoni
  • Giorgio Valentini
چکیده

The assessment of the reliability of clusters discovered in bio-molecular data is a central issue in several bioinformatics problems. Several methods based on the concept of stability have been proposed to estimate the reliability of each individual cluster as well as the ”optimal” number of clusters. In this conceptual framework a clustering ensemble is obtained through bootstrapping techniques, noise injection into the data or random projections into lower dimensional subspaces. A measure of the reliability of a given clustering is obtained through specific stability/reliability scores based on the similarity of the clusterings composing the ensemble. Classical stability-based methods do not provide an assessment of the statistical significance of the clustering solutions and are not able to directly detect multiple structures (e.g. hierarchical structures) simultaneously present in the data. Statistical approaches based on the chi-square distribution and on the Bernstein inequality, show that stability-based methods can be successfully applied to the statistical assessment of the reliability of clusters, and to discover multiple structures underlying complex bio-molecular data. In this paper we provide an overview of stability based methods, focusing on stability indices and statistical tests that we recently proposed in the context of the analysis of gene expression data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical methods for the assessment of clusters discovered in bio-molecular data

The assessment of the reliability of clusters discovered in bio-molecular data is a central issue in several bioinformatics problems, ranging from the definition of new taxonomies of malignancies based on bio-molecular data, to the validation of clusters of co-regulated or co-expressed genes, or the discovery of functional relationships from protein-protein interaction data. Recently, several m...

متن کامل

Electronic Structure Investigation of Octahedral Complex and Nano ring by NBO Analysis: An EPR Study

To calculation non-bonded interaction of the [CoCl6]3- complex embedded in nano ring, we focus on the single wall boron-nitride B18N18 nano ring. Thus, the geometry of B18N18 nano ring has been optimized by B3LYP method with EPR-II (Electron paramagnetic resonance) basis set and geometry of the [CoCl6]3- complex has been optimized at B3LYP method with Aldrich’s VTZ basis set and Stuttgart RSC 1...

متن کامل

Electronic Structure Investigation of Octahedral Complex and Nano ring by NBO Analysis: An EPR Study

To calculation non-bonded interaction of the [CoCl6]3- complex embedded in nano ring, we focus on the single wall boron-nitride B18N18 nano ring. Thus, the geometry of B18N18 nano ring has been optimized by B3LYP method with EPR-II (Electron paramagnetic resonance) basis set and geometry of the [CoCl6]3- complex has been optimized at B3LYP method with Aldrich’s VTZ basis set and Stuttgart RSC 1...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008